Patch-based analysis of visual speech from multiple views

نویسندگان

  • Patrick Lucey
  • Gerasimos Potamianos
  • Sridha Sridharan
چکیده

Obtaining a robust feature representation of visual speech is of crucial importance in the design of audio-visual automatic speech recognition systems. In the literature, when visual appearance based features are employed for this purpose, they are typically extracted using a “holistic” approach. Namely, a transformation of the pixel values of the entire region-of-interest (ROI) is obtained, with the ROI covering the speaker’s mouth and often surrounding facial area. In this paper, we instead consider a “patch” based visual feature extraction approach, within the appearance based framework. In particular, we conduct a novel analysis to determine which areas (patches) of the mouth ROI are the most informative for visual speech. Furthermore, we extend this analysis beyond the traditional frontal views, by investigating profile views as well. Not surprisingly, and for both frontal and profile views, we conclude that the central mouth patches are the most informative, but less so than the holistic features of the entire ROI. Nevertheless, fusion of holistic and the best patch based features further improves visual speech recognition performance, compared to either feature set alone. Finally, we discuss scenarios where the patch based approach may be preferable to holistic features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Multiple Views for Visual Speech Recognition

Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual...

متن کامل

P65: Speech Recognition Based on Bbrain Signals by the Quantum Support Vector Machine for Inflammatory Patient ALS

People communicate with each other by exchanging verbal and visual expressions. However, paralyzed patients with various neurological diseases such as amyotrophic lateral sclerosis and cerebral ischemia have difficulties in daily communications because they cannot control their body voluntarily. In this context, brain-computer interface (BCI) has been studied as a tool of communication for thes...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

End-to-End Multi-View Lipreading

Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading. However, the vast majority of recent lipreading works, including the deep learning approaches which significantly outperform traditional approaches, have focused on frontal mouth images. As a consequence, research on joint learning of visual features and speech classificatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008